Case Study 4 - Time Series


Time Series Analysis of Seasonal Flu


Martin Garcia, Michael Catalano, Jeremy Otsap, Christian Nava
July 1, 2020

Introduction


The seasonal flu is a public health concern that takes a large effort to mitigate. Icreased costs of medical care, loss of productivity, and death are all attributed to the disease each year. The average annual economic impact in the U.S. is estimated at approximately $11.2 billion.1

As of June 25, 2020, preliminary estimates of the number of flu deaths for the 2019-2020 flu season range between 24,000 and 62,000 deaths.2 The Centers for Disease Control use multiple methods to estimate track flu activity in order “to provide a national picture of flu activity.”3

Having a forecast of the incidence of influenza, or the rate of positive results for influenza tests, can aid public health officials in having a better understanding of how the disease is affecting the nation. It can also help public health administrators prepare a response that can most effectively allocate the appropriate resources.

This case study attempts model the national rate of infection using an autoregressive integrated moving average (ARIMA) model using publicly available flu data. Weekly flu data are taken from the World Health Organization’s (WHO) FluNet database from October 04, 2010 through June 07, 2020.

The data was cleaned and only 4 of the 22 variables were kept: Week, SDATE, SPEC_PROCESSED_NB, and ALL_INF. The latter three were renamed to wk_date.start, total_specimens.tested, and total_flu_cases.positive, respectively. An additional variable, total_flu_cases.percent_positive, was created, which is a percentage of the total specimens processed that tested positive for any strain of influenza.

In the plot below, the weekly positive rate, a percentage of the total specimens processed that tested positive for any strain of influenza, are plotted against the weekly number of tests processed. The positive rate appears not to vary too much from year to year even when the number of tests processed increases.

Statistics that allow time series data to be accruately described at all time points require the time series to be stationary. A stationary time series is one whose statistical properties such as mean, variance, autocorrelation, etc. are all constant over time. For a time series to be considered stationary the following conditions must be met:

  1. The mean does not depend on time.
  2. The variance is finite and does not depend on time.
  3. The correlation between data points only depends on how far apart they are in time and not where they are in time.

To estimate means, variance and autocorrelations from a single realization requires us to meet three stationarity conditions. This means the average number of positive flu cases across our times does not depend on time, or change over time. This would mean that if we repeated the same year with different realizations, those realizations would have the same mean. If we can safely assume a constant mean across all years we can use all the observations to estimate our mean. Similarly if all years have a finite and constant variance across our years we can use all the data to estimate the common variance. The last requirements requires the correlation between data points to be dependent on how far apart they are in time and not where they are in our 5 year timespan.

From a visual inspection it is difficult to determine if the mean is constant over time. If the mean were constant over time then every time period would have the same mean, i.e., the mean for December is the same for every. Below is a plot of mean positive rates for every week of the year. The plot shows there are weekly differences in the mean, suggesting that the expected value of the time series depends on time. This raises an intial concern regarding stationarity, which will be addressed later.

The correlation between data points (covariance) only depends on how far apart they are in time and not where they are in time (i.e., contstant autocovariance). The ACF plots below split the data in half to see if the autocorrelations (autocovariance) change over time. Autocorrelations that change over time would imply a non-stationary time series. Comparing the first half of the data to the second half of the data shows the ACFs are not identical. This suggests the autocovariance of the data is not constant over time.

The data is log-transformed to normalize values and eliminate the non-constant variance.

A comparison of the ACF plots for the log-transformed data shows the first half is identical to the second half, which suggests constant autocovariance over time.

ACF and Spectral Density

The autocorrelations exhibit sinusoidal ACF behavior converging to zero, which is characteristic of complex conjugate roots and an AR(2) process.

Additionaly, when looking at the spectral density, which helps identify the frequency content of a time series, there is a significant peak near 0 suggesting complex roots and possible seasonality in the data. There appears to be cyclic behavior in the time series data, which could imply seasonality and a non-stationary process. In the first spectral density plot there is a large peak near zero. When the truncation point is changed to 20, this peak is at approimately 0.02, or 1/50, which indicates a period of approximately one year.

However, it is important to note that time series data with cyclic behavior and no trend or seasonality is considered stationary if the cycles are not of a fixed length.4 Cyclic behavior should not be confused with seasonal behavior. If the fluctuations in the data are not of a fixed period, then they are considered cyclic. If the period of the cycle is fixed, then the pattern is seasonal.

From Table 1 below, it is evident that the cycles (measuring from peak to peak) are not of a fixed length and display aperiodic cyclic behavior. The number of weeks that elapse between peaks for the time series varies between 41 and 63 weeks. Intuitively, this makes sense as the peak of the flu “season” doesn’t necessarily fall on the same week or month every year. Therefore, it can be argued that the pattern is not seasonal but cyclic.

Table 1: Flu Season Peak Week
Flu Season Peak Week Start Date Peak Week Number # of Weeks Between Peaks Positive Rate
2010-2011 January 31, 2011 5 N/A 35.49%
2011-2012 March 12, 2012 11 58 31.90%
2012-2013 December 24, 2012 52 41 38.18%
2013-2014 December 23, 2013 52 52 30.61%
2014-2015 December 22, 2014 52 52 32.37%
2015-2016 March 07, 2016 10 63 28.59%
2016-2017 February 20, 2017 8 50 28.17%
2017-2018 January 08, 2018 2 46 30.50%
2018-2019 February 25, 2019 9 59 29.58%
2019-2020 February 03, 2020 6 49 32.74%

To give some validation to the initial assesments of the data lacking seasonality and trend, a more formal approach to test for stationarity an trend are explored.

Dickey-Fuller Test for Stationarity

Employing an augmented Dickey-Fuller test helps determine if one or more seasonal factors should be included in the model and tests the null hypothesis that the autoregressive model has a root outside of the unit circle. The test depends on failing to reject the null hypothesis to decide whether there is a unit root present. However, failing to reject the null hypothesis is not evidence that a unit root (i.e., seasonal factor) exists.

A p-value > 0.5 for the augmented Dickey-Fuller test would fail to reject the null hypothesis, which means that a unit root, or one or more seasonal factors, may be present. In the case of this data, the augmented Dickey-Fuller test rejects the null hypothesis with a p-value of 0.01, suggesting there are no seasonal factors present and validating the initial visual inspection of the data.


    Augmented Dickey-Fuller Test

data:  ts.log_flu
Dickey-Fuller = -5.6823, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
Cochrane-Orcutt Test to Check for Trend

To check if there is a trend, the Cochrane-Orcutt test is employed, which tests the hypothesis that there is no serial correlation in the residuals (i.e., no trend). The Cochrane-Orcutt test yields a p-value of 0.9133 and fails to reject the null hypothesis, which suggests there is no trend.

Per the initial visual inspection, the Cochrane-Orcutt test, and the Dickey-Fuller test, the data for the model will be assumed to be stationary with no trend and no seasonal factors.

Model Selection Methodology

The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) are used in this case study to select candidate models. The AIC and the BIC are measures that score a model based on its log-likelihood and complexity. The AIC aims to reduce the white noise variance in the model and penalizes models that add additional terms. The BIC is concurrently employed as the AIC tends to select higher order models, i.e., it may propose selecting an ARMA (2,2) over ARMA (1,1) model. The BIC imposes stronger penalties for increasing the orders of \(p\) and \(q\) and will tend to select models with fewer parameters. Lower values for AIC and BIC are preferred.

The function aic5.wge from the tswge package is used to identify the models with the lowest AIC and BIC. The AIC identifies a potential ARMA(3,1) model.

---------WORKING... PLEASE WAIT... 


Five Smallest Values of  aic 
p q aic
11 3 1 -3.178264
17 5 1 -3.177504
15 4 2 -3.172760
14 4 1 -3.172645
18 5 2 -3.172168

The BIC method also identifies an ARMA(3,1) model as the best structure for the data.

---------WORKING... PLEASE WAIT... 


Five Smallest Values of  bic 
p q bic
11 3 1 -3.109790
9 2 2 -3.098492
8 2 1 -3.092834
14 4 1 -3.090476
17 5 1 -3.081640

Candidate Model - ARMA(3,1)



Coefficients of Original polynomial:  
2.2175 -1.4737 0.2445 

Factor                 Roots                Abs Recip    System Freq 
1-1.9692B+0.9848B^2    0.9998+-0.1258i      0.9924       0.0199
1-0.2483B              4.0272               0.2483       0.0000
  
  

ARMA(\(p,\,q\)) models assume that the noise component, \(a_t\), of the model is white noise. If the residuals are not white noise, this suggests that further modeling may be necessary to better explain the behavior in the data. A visual inspection and a more formal test are employed to determine if the residuals are white noise.

Visual Inspection of Residuals

The residuals look random per the realization plot, suggesting white noise.

The plot of the sample autocorrelations shows lag 25 outside the confidence bands. However, at a 95% confidence level, 1 out of every 20 lags is expected to be outside the bands. Therefore, per a visual inspection, the residuals appear to be white noise.

Ljung-Box Test for residuals

A visual inspection looks at each autocorrelation separately. An alternative approach to checking the residuals for white noise is performing a Ljung-Box test. The Ljung-Box test approaches the autocorrelations as a group to determine if the residuals are white noise. It tests the null hypothesis (\(H_0\)) that all autocorrelations (\(\rho\)) are zero (i.e., the residuals are white noise). \[ H_0: \rho_1 = \rho_2 = ... = \rho_K = 0\]

If at least one autocorrelation is not zero, then white noise is not present. \[H_a: at\;least\;one\; \rho_k \neq 0, \, for\,\, 1 \leq k \leq K\]

In the tswge package, the residuals are found in the output variable $res. These are calculated within the functions est.ar.wge and est.arma.wge.

The Ljung-Box test yeilds \(p > 0.05\), which fails to reject the null hypothesis and suggests the residuals are white noise.

Obs -0.01099454 0.06186652 -0.1007106 0.03124515 -0.02971618 0.01862058 -0.01032152 0.1088832 -0.04462303 -0.06125367 -0.041286 0.03850463 0.0005598552 0.0218697 -0.002265376 0.03318781 -0.0008419987 -0.0740012 -0.008402974 -0.04907715 0.0265037 0.007055816 0.05382664 0.06023577 
$test
[1] "Ljung-Box test"

$K
[1] 24

$chi.square
[1] 14.7704

$df
[1] 20

$pval
[1] 0.7893909

As a second check, a different K-value is used, which yields \(p > 0.05\) also suggesting the residuals are white noise.

Obs -0.01099454 0.06186652 -0.1007106 0.03124515 -0.02971618 0.01862058 -0.01032152 0.1088832 -0.04462303 -0.06125367 -0.041286 0.03850463 0.0005598552 0.0218697 -0.002265376 0.03318781 -0.0008419987 -0.0740012 -0.008402974 -0.04907715 0.0265037 0.007055816 0.05382664 0.06023577 0.1354727 -0.05804677 0.08891488 0.002882477 0.08403592 0.00791233 -0.06529636 0.02418349 0.007283807 -0.04690817 0.01662693 0.02552653 0.05252992 -0.04462137 -0.03852077 -0.02863463 -0.01039259 -0.01992841 -0.06997014 0.01066953 0.02056869 0.05351268 0.03124399 -0.02713629 
$test
[1] "Ljung-Box test"

$K
[1] 48

$chi.square
[1] 33.38016

$df
[1] 44

$pval
[1] 0.8781805

Per a visual inspection and the Ljung-Box test, the residuals are white noise. This lends confidence in the model selected by the AIC and BIC methods.

Forecasting the Data

A plot of the forecast for the last year of data, 52 weeks, suggests the model does a fairly good job of predicitng the one year’s worth of data. However, this result suggests that the model may be better suited to forecast shorter time horizons.

Performance Metric

The average squared error (ASE) will be used to measure the goodness of fit of the model (performance). The ASE measure takes the sum of the square of the difference between the predicted value (forecast), \(\hat X_i\), and the actual value, \(X_i\). It then averages the error over \(n\) number of observations. A lower ASE value indicates the model made fewer forecast errors.

\[ASE = \frac{\sum(\hat X_i - X_i)^2}{n}\]

It should be noted that the ASE is a snapshot in time and can vary for the same data set depending on the size of the training data. A more stable approach it to use a rolling window ASE, where the training data is comprised of a “window” of observations (a window of time).

We’re taking that one model and sliding across the whole dataset and seeing how well it predicts the next \(n\) observations as I slide that window across so

This is a process similar to cross-validation, where nearly all of the data can be used to train and validate the model.

The training dataset will be comprised of at least 130 weeks, or 2.5 years of data. The forecast horizon will be used as the validation set.

More explanation goes in here about how a rolling window ASE is better than a single ASE.

Ideally, for the rolling window ASE charts below, a low and steady ASE value (red line) as compared to the observed values (blue line) is preferred. This would indicate that the model did a good job of predicting most, if not all, the observed values. Spikes in the ASE value represent observed values that were not predicted well, areas of large error.

From the rolling window ASE charts above and Table 2 below the performance of the model deteriorates as the horizon increases.

section here about why median rolling window ASE and not single ASE, or mean rolling window ASE. Maybe use histogram of rolling window ASE values to make your point.

Conclusion

When fitting an ARMA model to a set of data, the goal is to explain as much of the variability in the data as is reasonably possible. The ARMA(3,1) model does a good job at forecasting the number of weekly positive flu cases for a 2-week horizon. The performance of the model deteriorates as the forecast horizon increases from two weeks (mean rolling window ASE = 0.104) to 6 months (mean rolling window ASE = 0.751) as shown in Table 2.

References

[1] Putri, W., Muscatello, D. J., Stockwell, M. S., & Newall, A. T. (2018). Economic burden of seasonal influenza in the United States. Vaccine, 36(27), 3960–3966. https://doi.org/10.1016/j.vaccine.2018.05.057. Accessed June 25, 2020.

[2] “2019-2020 U.S. Flu Season: Preliminary Burden Estimates”, Centers for Disease Control and Prevention, Available from: https://www.cdc.gov/flu/about/burden/preliminary-in-season-estimates.htm. Accessed June 25, 2020.

[3] “Why CDC Estimates the Burden of Season Influenza in the U.S.”, Centers for Disease Control and Prevention, Available from: https://www.cdc.gov/flu/about/burden/why-cdc-estimates.htm. Accessed June 25, 2020.

[4] Hassanien, Aboul Ella & Shaalan, Khaled & Gaber, Tarek & Azar, Ahmad & Tolba, Mohamed. (2017). Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, p. 221.

Christian Nava

6/17/2020